[`core`] Fix peft multi-gpu issue #145

younesbelkada · 2023-03-02T14:21:12Z

What does this PR do?

This PR fixes peft + LoRA multi-gpu issue, to reproduce, run:

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

free_in_GB = int(torch.cuda.mem_get_info()[0]/1024**3)
max_memory = f'{free_in_GB-2}GB'

n_gpus = torch.cuda.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}

model = AutoModelForCausalLM.from_pretrained(
    "facebook/opt-350m", 
    device_map='auto',
    max_memory=max_memory
)

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-6.7b")

print(model)

for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

#model.gradient_checkpointing_enable()  # reduce number of stored activations
#model.model.decoder.project_in = lambda x: x.requires_grad_(True)

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

from peft import LoraConfig, get_peft_model 

config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "out_proj", "fc1", "fc2"],
    lora_dropout=0.01,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
print_trainable_parameters(model)


import transformers
from datasets import load_dataset
#data = load_dataset("lambada")
#data = data.map(lambda samples: tokenizer(samples['text']), batched=True)
data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples['quote']), batched=True)

trainer = transformers.Trainer(
    model=model, 
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=10, 
        max_steps=20, 
        learning_rate=3e-4, 
        fp16=True,
        logging_steps=1, 
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()


batch = tokenizer("Two things are infinite: ", return_tensors='pt')

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
model.eval()
with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))

cc @pacman100

djaym7 · 2023-03-06T21:49:12Z

Installed this and it works

pacman100

Hello @younesbelkada, thanks a lot for this PR to enable tuning using naive pipeline parallelism across multi-gpu setup. Left a few nits.

pacman100 · 2023-03-14T11:59:06Z

src/peft/tuners/lora.py

            return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
        elif self.r > 0 and not self.merged:
            result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
            if self.r > 0:
                result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
+            result = result


This can be removed I think

pacman100 · 2023-03-14T11:59:32Z

src/peft/tuners/lora.py

@@ -339,17 +344,20 @@ def eval(self):
        self.lora_B.eval()

    def forward(self, x: torch.Tensor):
+


This can be removed I think

younesbelkada · 2023-03-14T12:06:28Z

Thanks a lot for your review! Fixed the comments! Merging

add multi-gpu support

9ea7519

younesbelkada requested a review from pacman100 March 2, 2023 14:21

rm deepcopy

47acddd

djaym7 approved these changes Mar 6, 2023

View reviewed changes

dumpmemory mentioned this pull request Mar 8, 2023

GPT2 Training GPU Memory Increase with LoRA and Zero 3 #161

Closed

Merge remote-tracking branch 'upstream/main' into temp-2

6f40411

younesbelkada mentioned this pull request Mar 10, 2023

Let's support naive Pipeline Parallelism huggingface/trl#210

Merged

younesbelkada added 3 commits March 14, 2023 11:04

tryo to comment

be12bac

Merge remote-tracking branch 'upstream/main' into temp-2

a6b3968

style

ae7c42f

pacman100 reviewed Mar 14, 2023

View reviewed changes

pacman100 approved these changes Mar 14, 2023

View reviewed changes

fix nits

01cfe6b

younesbelkada merged commit df0e1fb into huggingface:main Mar 14, 2023

younesbelkada deleted the temp-2 branch March 14, 2023 12:06

macabdul9 mentioned this pull request Mar 31, 2023

Multi-gpu training still has issues #242

Closed

Wsy002 mentioned this pull request Apr 6, 2023

Multi-GPU setting in example gpt-neox-20b_peft huggingface/trl#276

Closed

younesbelkada mentioned this pull request Apr 7, 2023

Fine-tuning NLLB model in multi-gpu makes RuntimeError #265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`] Fix peft multi-gpu issue #145

[`core`] Fix peft multi-gpu issue #145

younesbelkada commented Mar 2, 2023

djaym7 commented Mar 6, 2023

pacman100 left a comment

pacman100 Mar 14, 2023

pacman100 Mar 14, 2023

younesbelkada commented Mar 14, 2023

		@@ -339,17 +344,20 @@ def eval(self):
		self.lora_B.eval()

		def forward(self, x: torch.Tensor):

[core] Fix peft multi-gpu issue #145

[core] Fix peft multi-gpu issue #145

Conversation

younesbelkada commented Mar 2, 2023

What does this PR do?

djaym7 commented Mar 6, 2023

pacman100 left a comment

Choose a reason for hiding this comment

pacman100 Mar 14, 2023

Choose a reason for hiding this comment

pacman100 Mar 14, 2023

Choose a reason for hiding this comment

younesbelkada commented Mar 14, 2023

[`core`] Fix peft multi-gpu issue #145

[`core`] Fix peft multi-gpu issue #145